Analysis of the Effect of Unexpected Outliers in the Classification of Spectroscopy Data
نویسندگان
چکیده
Multi-class classification algorithms are very widely used, but we argue that they are not always ideal from a theoretical perspective, because they assume all classes are characterised by the data, whereas in many applications, training data for some classes may be entirely absent, rare, or statistically unrepresentative. We evaluate onesided classifiers as an alternative, since they assume that only one class (the target) is well characterised. We consider a task of identifying whether a substance contains a chlorinated solvent, based on its chemical spectrum. For this application, it is not really feasible to collect a statistically representative set of outliers, since that group may contain anything apart from the target chlorinated solvents. Using a new one-sided classification toolkit, we compare a One-Sided k-NN algorithm with two wellknown binary classification algorithms, and conclude that the one-sided classifier is more robust to unexpected outliers.
منابع مشابه
Introduction Package CircOutlier For Detection of Outliers in Circular-Circular Regression
One of the most important problem in any statistical analysis is the existence of unexpected observations. Some observations are not a part of the study and are known as outliers. Studies have shown that the outliers affect to the performance of statistical standard methods in models and predictions. The point of this work is to provide a couple of statistical package in R software to identi...
متن کاملThe Unexpected Effect of Sodium Arsenate on the Interaction between Histone H1 and Sodium N-Dodecyl Sulphate
A Study was made on the interaction between histon H1 and sodium n-dodecyl sulphate (SDS) in the presence of sodium arsenate inside a phosphate buffer of pH 6.4, using spectroscopy and equilibrium dialysis at 27 °C. The binding data has been used to obtain the gibbs free energy in terms of a theoretical model based on the Wyman binding potential. The binding data hs been analysed...
متن کاملMetabolomics-Based Study of Logarithmic and Stationary Phases of Promastigotes in Leishmania major by 1H NMR Spectroscopy
Background: Cutaneous leishmaniasis is one of the most important parasitic diseases in humans. In this disease, one of the responsible organisms is Leishmania major, which is transmitted by sandfly vector. There are specific differences in biochemical profiles and metabolite pathways in logarithmic and stationary phases of Leishmania parasites. In the present study, 1H NMR spectroscopy was used...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملImpact of Outliers in Data Envelopment Analysis
This paper will examine the relationship between "Data Envelopment Analysis" and a statistical concept ``Outlier". Data envelopment analysis (DEA) is a method for estimating the relative efficiency of decision making units (DMUs) having similar tasks in a production system by multiple inputs to produce multiple outputs. An important issue in statistics is to identify the outliers. In this pap...
متن کاملInvestigation of outliers of evaluation scores among school of health instructors using outlier - determination indices
Introduction: Teacher evaluation, as an important strategyfor improving the quality of education, has been considered byuniversities and leads to a better understanding of the strengthsand weaknesses of education. Analysis of instructors’ scoresis one of the main fields of educational research. Since outliersaffect analysis and interpretation of information processes bothstructurally and concep...
متن کامل